Self-slimmed Vision Transformer
نویسندگان
چکیده
Vision transformers (ViTs) have become the popular structures and outperformed convolutional neural networks (CNNs) on various vision tasks. However, such powerful bring a huge computation burden, because of exhausting token-to-token comparison. The previous works focus dropping insignificant tokens to reduce computational cost ViTs. But when ratio increases, this hard manner will inevitably discard vital tokens, which limits its efficiency. To solve issue, we propose generic self-slimmed learning approach for vanilla ViTs, namely SiT. Specifically, first design novel Token Slimming Module (TSM), can boost inference efficiency ViTs by dynamic token aggregation. As general method dropping, our TSM softly integrates redundant into fewer informative ones. It dynamically zoom visual attention without cutting off discriminative relations in images, even with high slimming ratio. Furthermore, introduce concise Feature Recalibration Distillation (FRD) framework, wherein reverse version (RTSM) recalibrate unstructured flexible auto-encoder manner. Due similar structure between teacher student, FRD effectively leverage knowledge better convergence. Finally, conduct extensive experiments evaluate demonstrates that speed up $$\mathbf {1.7}\times $$ negligible accuracy drop, {3.6}\times while maintaining $$\textbf{97}\%$$ their performance. Surprisingly, simply arming LV-ViT SiT, achieve new state-of-the-art performance ImageNet. Code is available at https://github.com/Sense-X/SiT .
منابع مشابه
Accelerating 'Intelligent Scissors' Using Slimmed Graphs
In this paper, we describe an acceleration technique for the semi-automatic image segmentation algorithm, intelligent scissors. Using intelligent scissors, user can accurately and interactively extract the object from the digitized image. However, the original algorithm suffers from slow performance when large images are treated. In practice, pixels within the non-edge regions are seldom involv...
متن کاملThe Self- tar Vision
Achieving various selfproperties has been a grand challenge of computer science and engineering since the building of the first computer. The latest reincarnation of this challenge is due to the fact that large, complex and dynamic information systems have suddenly become a key part of the infrastructure of modern societies. Accordingly, it has become very important to be able to build, manage,...
متن کاملSelf-awareness affects vision
References 1. Deuschl, G., Schade-Brittinger, C., Krack, P., Volkmann, J., Schafer, H., Botzel, K., Daniels, C., Deutschlander, A., Dillmann, U., Eisner, W., et al. (2006). A randomized trial of deepbrain stimulation for Parkinson‘s disease. N. Eng. J. Med. 355, 896–908. 2. Ali, F.R., Michell, A.W., Barker, R.A., and Carpenter, R.H.S. (2006). The use of quantitative oculometry in the assessment...
متن کاملSelf-adaptive Vision System
Light conditions represent an important part of every vision application. This paper describes one active behavioral scheme of one particular active vision system. This behavioral scheme enables an active system to adapt to current environmental conditions by constantly validating the amount of the reflected light using luminance meter and dynamically changed significant vision parameters. The ...
متن کاملA self-stabilizing transformer for population protocols with covering
Developing self-stabilizing solutions is considered to be more challenging and complicated than developing classical solutions, where a proper initialization of the variables can be assumed. Hence, to ease the task of the developers, some automatic techniques have been proposed to design self-stabilizing algorithms. In this paper, we propose an automatic transformer for algorithms in an extende...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2022
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-031-20083-0_26